10 research outputs found

    Computational Methods and Graphical Processing Units for Real-time Control of Tomographic Adaptive Optics on Extremely Large Telescopes.

    Get PDF
    Ground based optical telescopes suffer from limited imaging resolution as a result of the effects of atmospheric turbulence on the incoming light. Adaptive optics technology has so far been very successful in correcting these effects, providing nearly diffraction limited images. Extremely Large Telescopes will require more complex Adaptive Optics configurations that introduce the need for new mathematical models and optimal solvers. In addition, the amount of data to be processed in real time is also greatly increased, making the use of conventional computational methods and hardware inefficient, which motivates the study of advanced computational algorithms, and implementations on parallel processors. Graphical Processing Units (GPUs) are massively parallel processors that have so far demonstrated a very high increase in speed compared to CPUs and other devices, and they have a high potential to meet the real-time restrictions of adaptive optics systems. This thesis focuses on the study and evaluation of existing proposed computational algorithms with respect to computational performance, and their implementation on GPUs. Two basic methods, one direct and one iterative are implemented and tested and the results presented provide an evaluation of the basic concept upon which other algorithms are based, and demonstrate the benefits of using GPUs for adaptive optics

    Improved Acceleration of the GPU Fourier Domain Acceleration Search Algorithm

    Full text link
    We present an improvement of our implementation of the Correlation Technique for the Fourier Domain Acceleration Search (FDAS) algorithm on Graphics Processor Units (GPUs) (Dimoudi & Armour 2015; Dimoudi et al. 2017). Our new improved convolution code which uses our custom GPU FFT code is between 2.5 and 3.9 times faster the than our cuFFT-based implementation (on an NVIDIA P100) and allows for a wider range of filter sizes then our previous version. By using this new version of our convolution code in FDAS we have achieved 44% performance increase over our previous best implementation. It is also approximately 8 times faster than the existing PRESTO GPU implementation of FDAS (Luo 2013). This work is part of the AstroAccelerate project (Armour et al. 2002), a many-core accelerated time-domain signal processing library for radio astronomy.Comment: proceeding from ADASS XXVII conference, 4 page

    GPU Fast Convolution via the Overlap-and-Save Method in Shared Memory

    Get PDF
    We present an implementation of the overlap-and-save method, a method for the convolution of very long signals with short response functions, which is tailored to GPUs. We have implemented several FFT algorithms (using the CUDA programming language), which exploit GPU shared memory, allowing for GPU accelerated convolution. We compare our implementation with an implementation of the overlap-and-save algorithm utilizing the NVIDIA FFT library (cuFFT). We demonstrate that by using a shared-memory-based FFT, we can achieved significant speed-ups for certain problem sizes and lower the memory requirements of the overlap-and-save method on GPUs

    A GPU implementation of the Correlation Technique for Real-time Fourier Domain Pulsar Acceleration Searches

    Full text link
    The study of binary pulsars enables tests of general relativity. Orbital motion in binary systems causes the apparent pulsar spin frequency to drift, reducing the sensitivity of periodicity searches. Acceleration searches are methods that account for the effect of orbital acceleration. Existing methods are currently computationally expensive, and the vast amount of data that will be produced by next generation instruments such as the Square Kilometre Array (SKA) necessitates real-time acceleration searches, which in turn requires the use of High Performance Computing (HPC) platforms. We present our implementation of the Correlation Technique for the Fourier Domain Acceleration Search (FDAS) algorithm on Graphics Processor Units (GPUs). The correlation technique is applied as a convolution with multiple Finite Impulse Response filters in the Fourier domain. Two approaches are compared: the first uses the NVIDIA cuFFT library for applying Fast Fourier Transforms (FFTs) on the GPU, and the second contains a custom FFT implementation in GPU shared memory. We find that the FFT shared memory implementation performs between 1.5 and 3.2 times faster than our cuFFT-based application for smaller but sufficient filter sizes. It is also 4 to 6 times faster than the existing GPU and OpenMP implementations of FDAS. This work is part of the AstroAccelerate project, a many-core accelerated time-domain signal processing library for radio astronomy.Comment: 20 pages, 9 figures. Accepted for publication in ApJ

    Bits Missing: Finding Exotic Pulsars Using bfloat16 on NVIDIA GPUs

    Get PDF
    The Fourier domain acceleration search (FDAS) is an effective technique for detecting faint binary pulsars in large radio astronomy data sets. This paper quantifies the sensitivity impact of reducing numerical precision in the graphics processing unit (GPU)-accelerated FDAS pipeline of the AstroAccelerate (AA) software package. The prior implementation used IEEE-754 single-precision in the entire binary pulsar detection pipeline, spending a large fraction of the runtime computing GPU-accelerated fast Fourier transforms. AA has been modified to use bfloat16 (and IEEE-754 double-precision to provide a “gold standard” comparison) within the Fourier domain convolution section of the FDAS routine. Approximately 20,000 synthetic pulsar filterbank files representing binary pulsars were generated using SIGPROC with a range of physical parameters. They have been processed using bfloat16, single-precision, and double-precision convolutions. All bfloat16 peaks are within 3% of the predicted signal-to-noise ratio of their corresponding single-precision peaks. Of 14,971 “bright” single-precision fundamental peaks above a power of 44.982 (our experimentally measured highest noise value), 14,602 (97.53%) have a peak in the same acceleration and frequency bin in the bfloat16 output plane, while in the remaining 369 the nearest peak is located in the adjacent acceleration bin. There is no bin drift measured between the single- and double-precision results. The bfloat16 version of FDAS achieves a speedup of approximately 1.6× compared to single-precision. A comparison between AA and the PRESTO software package is presented using observations collected with the GMRT of PSR J1544+4937, a 2.16 ms black widow pulsar in a 2.8 hr compact orbit

    HARMONI at ELT: overview of the capabilities and expected performance of the ELT's first light, adaptive optics assisted integral field spectrograph.

    No full text
    International audienc
    corecore